Search CORE

76 research outputs found

Recommended from our members

Inductive Bias and Modular Design for Sample-Efficient Neural Language Learning

Author: Ponti Edoardo
Publication venue: University of Cambridge
Publication date: 30/07/2020
Field of study

Most of the world's languages suffer from the paucity of annotated data. This curbs the effectiveness of supervised learning, the most widespread approach to modelling language. Instead, an alternative paradigm could take inspiration from the propensity of children to acquire language from limited stimuli, in order to enable machines to learn any new language from a few examples. The abstract mechanisms underpinning this ability include 1) a set of in-born inductive biases and 2) the deep entrenchment of language in other perceptual and cognitive faculties, combined with the ability to transfer and recombine knowledge across these domains. The main contribution of my thesis is giving concrete form to both these intuitions. Firstly, I argue that endowing a neural network with the correct inductive biases is equivalent to constructing a prior distribution over its weights and its architecture (including connectivity patterns and non-linear activations). This prior is inferred by "reverse-engineering" a representative set of observed languages and harnessing typological features documented by linguists. Thus, I provide a unified framework for cross-lingual transfer and architecture search by recasting them as hierarchical Bayesian neural models. Secondly, the skills relevant to different language varieties and different tasks in natural language processing are deeply intertwined. Hence, the neural weights modelling the data for each of their combinations can be imagined as lying in a structured space. I introduce a Bayesian generative model of this space, which is factorised into latent variables representing each language and each task. By virtue of this modular design, predictions can generalise to unseen combinations by extrapolating from the data of observed combinations. The proposed models are empirically validated on a spectrum of language-related tasks (character-level language modelling, part-of-speech tagging, named entity recognition, and common-sense reasoning) and a typologically diverse sample of about a hundred languages. Compared to a series of competitive baselines, they achieve better performances in new languages in zero-shot and few-shot learning settings. In general, they hold promise to extend state-of-the-art language technology to under-resourced languages by means of sample efficiency and robustness to the cross-lingual variation.ERC (Consolidator Grant 648909) Lexical Google Research Faculty Award 201

Apollo (Cambridge)

Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization

Author: Glavaš Goran
Korhonen Anna
Mrkšić Nikola
Ponti Edoardo Maria
Vulić Ivan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Semantic specialization is the process of fine-tuning pre-trained distributional word vectors using external lexical knowledge (e.g., WordNet) to accentuate a particular semantic relation in the specialized vector space. While post-processing specialization methods are applicable to arbitrary distributional vectors, they are limited to updating only the vectors of words occurring in external lexicons (i.e., seen words), leaving the vectors of all other words unchanged. We propose a novel approach to specializing the full distributional vocabulary. Our adversarial post-specialization method propagates the external lexical knowledge to the full distributional space. We exploit words seen in the resources as training examples for learning a global specialization function. This function is learned by combining a standard L2-distance loss with an adversarial loss: the adversarial component produces more realistic output vectors. We show the effectiveness and robustness of the proposed method across three languages and on three tasks: word similarity, dialog state tracking, and lexical simplification. We report consistent improvements over distributional word vectors and vectors specialized by other state-of-the-art specialization frameworks. Finally, we also propose a cross-lingual transfer method for zero-shot specialization which successfully specializes a full target distributional space without any lexical knowledge in the target language and without any bilingual data.Comment: Accepted at EMNLP 201

arXiv.org e-Print Archive

Crossref

MAnnheim DOCument Server

Edinburgh Research Explorer

Distributed Representations of Lexical Sets and Prototypes in Causal Alternation Verbs

Author: Jezek Elisabetta
Magnini Bernardo
Ponti Edoardo Maria
Publication venue: 'OpenEdition'
Publication date: 01/06/2017
Field of study

Edinburgh Research Explorer

Recommended from our members

Corpus-based measures discriminate inflection and derivation cross-linguistically

Author: Goldwater Sharon
Haley Coleman
Ponti Edoardo M
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/06/2023
Field of study

Japanese passives are traditionally considered to have two types: direct and indirect passives. However, more recent studies, such as Ishizuka (2012), suggest the two types can be unified un- der the same syntactic movement analysis. Uti- lizing the Balanced Corpus of Contemporary Written Japanese (BCCWJ; Maekawa, 2008; Maekawa et al., 2014), this study aims to in- vestigate how likely different types of passives appear in the naturally occurring texts, espe- cially in relation to markedness-based hierar- chy called Noun Phrase Accessibility Hierar- chy (NPAH; Keenan and Comrie, 1977), and to investigate if true indirect passives occur in contemporary written Japanese

ScholarWorks@UMass Amherst

Isomorphic Transfer of Syntactic Structures in Cross-Lingual NLP

Author: Korhonen Anna
Ponti Edoardo
Reichart Roi
Vulic I
Publication venue: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)
Publication date: 01/01/2018
Field of study

The transfer or share of knowledge between languages is a popular solution to resource scarcity in NLP. However, the effectiveness of cross-lingual transfer can be challenged by variation in syntactic structures. Frameworks such as Universal Dependencies (UD) are designed to be cross-lingually consistent, but even in carefully designed resources trees representing equivalent sentences may not always overlap. In this paper, we measure cross-lingual syntactic variation, or anisomorphism, in the UD treebank collection, considering both morphological and structural properties. We show that reducing the level of anisomorphism yields consistent gains in cross-lingual transfer tasks. We introduce a source language selection procedure that facilitates effective cross-lingual parser transfer, and propose a typologically driven method for syntactic tree processing which reduces anisomorphism. Our results show the effectiveness of this method for both machine translation and cross-lingual sentence similarity, demonstrating the importance of syntactic structure compatibility for boosting cross-lingual transfer in NLP

Crossref

Edinburgh Research Explorer

Apollo (Cambridge)

Efficient Transformers with Dynamic Token Pooling

Author: Chorowski Jan
Nawrot Piotr
Ponti Edoardo M.
Łańcucki Adrian
Publication venue
Publication date: 24/05/2023
Field of study

Transformers achieve unrivalled performance in modelling language, but remain inefficient in terms of memory and time complexity. A possible remedy is to reduce the sequence length in the intermediate layers by pooling fixed-length segments of tokens. Nevertheless, natural units of meaning, such as words or phrases, display varying sizes. To address this mismatch, we equip language models with a dynamic-pooling mechanism, which predicts segment boundaries in an autoregressive fashion. We compare several methods to infer boundaries, including end-to-end learning through stochastic re-parameterisation, supervised learning (based on segmentations from subword tokenizers or spikes in conditional entropy), as well as linguistically motivated boundaries. We perform character-level evaluation on texts from multiple datasets and morphologically diverse languages. The results demonstrate that dynamic pooling, which jointly segments and models language, is both faster and more accurate than vanilla Transformers and fixed-length pooling within the same computational budget

arXiv.org e-Print Archive